Who Wrote This Code? Identifying the Authors of Program Binaries

نویسندگان

  • Nathan E. Rosenblum
  • Xiaojin Zhu
  • Barton P. Miller
چکیده

Program authorship attribution—identifying a programmer based on stylistic characteristics of code—has practical implications for detecting software theft, digital forensics, and malware analysis. Authorship attribution is challenging in these domains where usually only binary code is available; existing source code-based approaches to attribution have left unclear whether and to what extent programmer style survives the compilation process. Casting authorship attribution as a machine learning problem, we present a novel program representation and techniques that automatically detect the stylistic features of binary code. We apply these techniques to two attribution problems: identifying the precise author of a program, and finding stylistic similarities between programs by unknown authors. Our experiments provide strong evidence that programmer style is preserved in program binaries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Poster: Atoms of Style: Identifying the Authors of Program Binaries

Being able to identify the author of a program has many applications in both academic and commercial environments. In most use cases, the source code is readily available, and this is reflected in the literature, as previous work has mostly focused on source code analyses. In contrast, scant research has been carried out on identifying the authors of executable program binaries. This would be m...

متن کامل

Machine Learning-Assisted Binary Code Analysis

Binary code analysis is a foundational technique in the areas of computer security, performance modeling, and program instrumentation. In computer security, such analysis can provide the basis for detecting, understanding and controlling malicious code. Any analysis of malicious program requires as a first step precisely locating the Function Entry Points (FEPs, the starting byte of each functi...

متن کامل

When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries

The ability to identify authors of computer programs based on their coding style is a direct threat to the privacy and anonymity of programmers. Previous work has examined attribution of authors from both source code and compiled binaries, and found that while source code can be attributed with very high accuracy, the attribution of executable binary appears to be much more difficult. Many pote...

متن کامل

BYTEWEIGHT: Learning to Recognize Functions in Binary Code

Function identification is a fundamental challenge in reverse engineering and binary program analysis. For instance, binary rewriting and control flow integrity rely on accurate function detection and identification in binaries. Although many binary program analyses assume functions can be identified a priori, identifying functions in stripped binaries remains a challenge. In this paper, we pro...

متن کامل

Jacobite Explanation of the Trinity in the Context of Muʿtazilite Theology: Abu Raʾitah al-Takriti

The Melkites, Jacobites, and Nestorians were the main Christian communities under Muslim rule. Several pre-Islamic Arab Christian authors wrote treatises concerning their beliefs in Arabic, some of which date back to the early Islamic centuries. The multiplicity of such polemical works suggests an intellectually open society and a degree of tolerance shown by Muslim leaders. Abu Raʾita...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011